Phrase-Based Statistical Machine Translation Using Approximate Matching
نویسندگان
چکیده
Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been seen in training, it cannot be translated as a whole phrase. In this paper we try to overcome this drawback. The basic idea is that if a source phrase is not in our dictionary (has not been seen in training), we look for the most similar in our dictionary and try to adapt its translation to the source phrase. We are using the well known edit distance as a measure of similarity. We present results from an English-Spanish task (XRCE).
منابع مشابه
Hierarchical Phrase-Based Translation with Suffix Arrays
A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. Hierarchical phrasebased translation introduces the added wrink...
متن کاملAn Improved Patent Machine Translation System Using Adaptive Enhancement for NTCIR-10 PatentMT Task
This paper describes the work that we conducted for the Chinese-English (CE) task of the NTCIR-10 patent machine translation evaluation. We built standard phrase-based and hierarchical phrase-based statistical machine translation (SMT) systems with optimized word segmentation, adaptive language model and improved parameter tuning strategy. Our systems outperform official baselines by approximat...
متن کاملPartial Matching Strategy for Phrase-based Statistical Machine Translation
This paper presents a partial matching strategy for phrase-based statistical machine translation (PBSMT). Source phrases which do not appear in the training corpus can be translated by word substitution according to partially matched phrases. The advantage of this method is that it can alleviate the data sparseness problem if the amount of bilingual corpus is limited. We incorporate our approac...
متن کاملEnriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases
In this paper, we present a new hybridisation approach consisting of enriching the phrase table of a phrase-based statistical machine translation system with bilingual phrase pairs matching structural transfer rules and dictionary entries from a shallowtransfer rule-based machine translation system. We have tested this approach on different small parallel corpora scenarios, where pure statistic...
متن کامل35 rue Saint-Honoré, Fontainebleau
The graph matching problem is among the most important challenges of graph processing, and plays a central role in various fields of pattern recognition. We propose an approximate method for labeled weighted graph matching, based on a convexconcave programming approach which can be applied to the matching of large sized graphs. This method allows to easily integrate information on graph label s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007